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, Abstract 

. Paterson, Stinson and Wei [2] introduced Combinatorial batch codes, which are combi- 

natorial description of Batch code. Batch codes were first presented by Ishai, Kushilevita, 
. Ostrovsky and Sahai [1] in STOC'04. In this paper we answer some of the questions put 

forward by Paterson, Stinson and Wei and give some results for the general case i > 1 
which were not studied by the authors. 



0^ 



■ 1 Introduction 

• Batch codes were proposed by Ishai, Kushilevita, Ostrovsky and Sahai [1]. Suppose there 

O . exists a large database of n items distributed in m devices. A user has to choose an arbitrary 

subset of k items such that she can read at most t items from each device. The problem is 
to distribute the items in the database such that the total storage N is minimized. Batch 
codes are constructed for this purpose. Paterson, Stinson and Wei [2] gave a combinatorial 
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If) , formulation of the above problem and introduced the study of Combinatorial batch codes. 

' They studied the special case when t = 1. In this paper we extend their work to study 

■ combinatorial batch codes for arbitrary t. We also give solutions to some of the questions 

G^ 
O 

(yQ ' Batch codes can be defined as follows: 

o ■ 

Definition 1 [2, Section 1] An {n, N, k, m, t) batch code over an alphabet ^ encodes a string 
X £ ^^^'^ an m — tuple of strings yi,y2, ■ ■ ■ ,ym ^ X]* (o-lso referred to as servers) of total 
r> I length N, such that for each k — tuple (batch) of distinct indices ii,i2, ■ ■ ■ ,ik G {l, 2, . . . , n}, 

d ' the entries Xj ^ , , • • • , Xj^. from x can be decoded by reading at most t symbols from each 

server. 

Our aim is to minimize A^. In [2], Paterson, Stinson and Wei initiated the study of 
Combinatorial Batch Codes which is combinatorial formulation of the replication based Batch 
codes as proposed by Ishai, Kushilevita, Ostrovsky and Sahai [1]. Combinatorial Batch codes 
can be defined as follows: 

Definition 2 [2, Section 1] An (n. A, A;, m, t) combinatorial batch code ( CBC) is a set system 
(A, i3), where X is a set of n elements (called items), B is a collection of m subsets of X 
(called servers) and N = J^BeB \^\' -s^c/i that for each k-subset {xj^,a;i2, • • • ,Xi^} C X there 
exists a subset Ci C Bi, where \Ci\ < t, i = 1,2, . . . ,m, such that 

{Xii , Xi2 , ■ ■ ■ , Xif, } C Ui=l 
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Combinatorial batch codes can be represented as follows : The items are represented by 
points of the set system and the servers are represented by subsets of these points. We follow 
the conventions as given in [2]. We consider the dual set system, in which the servers are 
represented by points and each item in a database is represented by a set (referred to as a 
block) containing the points (i.e., the servers) that stores that item. 

Given a set system {X,B) with X = {xi,X2,--- and B = {Bi, B2, • • • the 

incidence matrix of {X,B) is a 6 x n matrix A = (aij), where 

f 1 if Xj G Bi 
an = { 

[Oifxj^Si 

Paterson, Stinson and Wei considered the special case where t = 1. They represented 
(n, A^, m, 1) by (n, N, k, m)-CBC. We also follow this convention. Additionally, we represent 
{n,N,k,m,t) by {n, N,k,m)t-CBC for t > 1. The {n, N,k,m)-CBC is optimal if iV < TV' for 
all (n, A^', fc, m)-CBC and we denote the corresponding value of A^ by N{n,k,'m). Similarly 
we denote the optimal value of a (n, A^, k, m)t-CBC by Nt{n, k, m) for t > 1. 

A set if positions in a matric is called a transversal, if it contains one position in each row 
and in each column. We state the important lemma given in [2]. 

Lemma 1 [2][Section 1.1] An m x n 0-1 matrix is an incidence matrix of an {n, N,k,m)- 
CBC if and only if, for any k columns, there is a k x k sub matrix which has at least one 
transversal containing k ones. 

Refer to exmaple [H If we consider any four columns say 12, 25, 40, and 55, then a 
transversal can be obtained as shown in the figure [1] below. 

0100 
1001 
1100 
0111 
1010 
0011 

Figure 1: One way of choosing k columns. 

The rate of a code is defined as the ratio c = n/N. We define column uniform Batch 
Codes as batch codes which have the same number of ones in all the columns. If c be the rate 
then there are exactly 1/c ones in each column. This also implies that each item is replicated 
1/c times and stored in servers. We denote by n(m, c, k) and nt{m, c, k), the maximum values 
of n for which there exists a uniform (n, cn, k, rn-)-CBC and nt{n, cn, k, m)-CBC respectively. 

1.1 Previous results 

We present the results given by Paterson, Stinson and Wei [2]. 

1. N{n, k,n) = n 

2. N{n,k,k) = kn - k{k - 1) 
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3. N{n,k,n- I) = n - 1 + k 

4. Iin>{k- 1) J , then N{n, k, m) = kn-{k- 1) {^^^) . 

For fixed rate combinatorial batch codes, the authors stated the following results in [2]. 

1. n{m,c,k) < ^fc_\y 

2. n{m, c, c + 1) = cC^) 

3. n(m,c,c + 2) = (™) 

4. n(m,2,4) = {"^) 

5. [(m^ - l)/4] < n(m, 2, 5) < [(m^ + 2m - 3) /4] . 
1.2 Our Results 

In [2] the authors asked the following questions. Can N{n, k, m) be computed for a range of 
values of n, where n < {k — We answer this question by finding the optimal value 

N{n,k,m) when {fTl^ < n < {k — ^){^'^- We also present some column uniform batch 
codes with fixed rate 1/3. This question was also posed by Patcrson, Stinson and Wei. 

In their paper [2], the authors considered the case t = 1. We present some result for 
arbitrary t. 

2 Optimal Value N{n, m, k) when (^^J < m < (A; - 1) (^!^ J 

Paterson, Stinson and Wei gave some exact values of N{n,m, k) for n> {k — They 
showed that if n > (A; — 1) (^,"\) , then N{n, k, m) = kn — (k — 1) ■ 

We extend this result for values of n such that {i!!^2) ^ ^ ~ 1) (^-2) • 
We say that s columns span I rows if the s blocks span I points. This means that s blocks 
together contain / points. Putting another way, s items arc contained in I servers. First 
we try to find the exact values of N{n,m, k) for n close to {k — We observe that 

when n = (k — then each column of the incidence matrix contains k — 1 ones and is 

repeated k—\ times. The reason behind this is that any two distinct columns span at least 
k rows. If k columns are selected at least two will be distinct and hence will span k rows. We 
say that (^™^) distinct columns belong to the same group. So there are k — 1 groups. 
We denote a column i of the incidence matrix of a CBC by {a\^ia2,i ■ ■ ■ am,i)^- 
If we consider all the (^™^) columns in a given block then, the number of columns which 
will have ones in the same k — 1 rows is 1, the number of distinct columns which will have 
ones in the same k — 2 rows will be m — (A; — 2). The number of distinct columns which will 
have ones in the same i rows will be ■ 
Consider the Example. 

Example 1 Let m = 6, k = 4. Let n = {k — The 60 columns each contain three 

ones and three zeros. 
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If we consider all the 20 columns in a block then, the number of columns which will have 

ones in the same k — 1 rows is 1. The number of columns which will have ones in the same 
k — 2 = 2 rows vnll be m — [k — 2) =4. For example columns 10, 16, 19 and 20 have ones 
in the rows 5 and 6. No other column in the same group has ones in these two rows. The 
number of columns which will have ones in the same A; — 3 = 1 rnme unU hp 



rows will be (^-(^-'•^)) = lo. 



While finding the values of N{n, m, k) we decrease the value of n by deleting columns 
at each step and modifying the existing columns such that any k columns will span at least 

fc+l fc— 1 

k rows. We consider any column say (00 • • • 11 • • • 1)^. We consider all columns in the 
same block which contain k — 2 ones in the rows m — k + S,m — k + 4, . . . ,m. There are 
m — k + 2 such distinct columns. Suppose we delete less than m — k + 1 of these distinct 

m—k+l fc— 2 m—k fc'— 2 

columns. Wlog, let the remaining columns be (1 00 • • • 11 • • • 1)^ and (01 00 • • • 11 • • • 1)'^. 

m-k+2 k-2 

Suppose we now change the first column to (5o'^'^~^lT^'^~l)-^. Then we find that if we select 

m—k fc— 2 

the column (01 00 • • • 11 • • • 1)^ A; - 1 times one each from the fc — 1 groups and the column 
m-k+2 k-2 

(00 • • • 11 • • • 1)"^ once then the fc columns do not span fc rows. Hence we cannot change the 
first column. Similarly with the second column. If we delete less than m — fc + 1 columns, 
then we cannot change any of the other columns, because reducing the number of ones we 
can always find another column having fc — 2 ones in the same rows which when considered 
fc — 1 times from fc — 1 groups, together with the changed column will span less than fc rows. 
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This leads us to the following Theorem. 
Theorem 1 Ifn>{k-1) (^'^ J - m + k, then N{n, m,k) = n{k-l). 
Prom the discussion above, the following observation can be made. 

Observation 1 If m — A; + 1 columns in the same group are so deleted that they have 
k — 2 ones in the same rows, then the remaining column which contain k — 2 ones in the same 
row can be modified such that the k — 2 ones in those rows are unchanged and a zero is placed 
instead of 1 in the remaining position. We show that if such a construction is made, then any 
k columns will span k rows. 

We consider the three possible cases. 

1. If wc consider any k columns such that replica of the deleted/modified column(s) is not 
chosen, then any two of them will be distinct. As shown previously they will span k 
rows. 

2. If one or more columns are so selected that the replica of the deleted column has been 
selected and the modified column is not selected then atleast two distinct columns will 
be selected. So any k columns will span at least k rows. 

3. So now we consider that the modified column is selected. Here two cases arise. 

(a) The k — 2 selected columns are the same as the modified column (chosen from the 
other groups) and one column is selected at random. Since there are atleast two 
distinct columns containing A; — 1 ones, the k columns will span k rows. 

(b) The k — 1 columns are selected at random, then atleast two will be distinct. So 
the k columns will span k rows. 

We note that there are k — 2 duplicate columns which will have ones in the same position as 

the modified column. These k — 2 columns together with the modified column will span k — 1 
rows. However if we select any other row then all the k columns will span k rows. Prom this 
the Theorem follows. 

Theorem 2 //n = (A; - 1) -{m-k + 1), then N{n,m,k) = n{k-l) - 1. 

We now present a construction which gives us the optimal value for any n where {jj^^ < 
n < (A; — 1) ■ We then state the main theorem which gives us all values of N{n, m, k) for 

ra<^<("^-l)(."^i)- 

We construction similar to that presented earlier in this section. We delete columns from 

the incidence matrix and see how we can reduce the number of ones in the existing columns. 

Construction 1 Let the initial incidence matrix containing {k — 1) {j^i) columns be denoted 
by A. 

1. m — k columns rea deleted from the same group gi, such that all have ones in the 
m — k + 2>,m — k + ^, ■ ■ ■ , m rows. 
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2. The columns (00 • • • 11 • • • 1)^, (00 • • • 10 11 • • • 1)"^, (01 00 ••• 11 ••• 1)^ are deleted 

m—k+l k—2 

from the same group gi and the column (100---011---1)"^ in group gi is modified to 

m-fc+2 fc-2 

(oo'^ir^)^. 



3. Similarly, the columns (00 • • • 110 11 • • • 1)^, (00 • • • 1010 11 • • • 1)^, • • • , (01 00 • • • 10 11 

m—k+l k—1 

are deleted from the same group gi and (00 • • • 11 • • • 1)"^ are deleted from any other 

m—k fc— 3 m—k+l fc— 3 

group g2 and the column (1 00 • • • 10 11 • • • 1)"^ in group gi modified to (00 • • • 10 iT^- • 1)-^. 
So m — k + 1 columns are deleted at this step. 



4. Similarly, we can delete the columns (00 • • • 1110 11 • • • 1)^, (00 • • • 10110 11 • • • 1)^, 

m—k fc— 4 m—k+l k—1 

■ ■ ■ , (01 00 • • • 0110 iT^"— T^-^ form the same group gi and fOO • 11 • • • 1)"^ from an- 
other group 53 ( we note that the corresponding column in g2 has already been deleted 

m—k fc— 4 

in the previous step) and modify the column (1 00 • • • 110 11 • • • 1)"^ in group gi to 

m-k+l fc-4 

(00 • • • 110 11 • • • 1)^. So m — k + 1 columns are deleted at this step also. 

5. Suppose we consider the columns having ones in the m — k — i + 2 th and m — k — i + 3-th 
row. We delete m — k — i + 1 columns from the group gi andi columns from other groups 
in a similar manner as above. We then modify the appropriate column. 

I'm—l\ „. . .„ „„„ T „..-j.j, oN/ni— 1^ 



6. At the end of {^_2) steps we are left with (k — 2)(™_2) columns containing k — 1 ones 

'm-l^ 
V k-2/ 



and (To) columns containing k — 2 ones. 



7. We delete (m — /c + 1) columns and modify one column in each step such that after 
{^ZD steps we are left with (^'^2) columns each having k — 2 ones. 

We present an example to demonstrate the construction. 
Examplel contd. If we delete less than m — k + l = 3 columns say columns 59,60, then 
n = 58 and = n{k — 1). We delete the columns 56,59,60 and modify the column 50, such 
that it now contains 2 ones in rows 5 and 6. So now re = 57 and N = n{k — 1) — 1, We 
delete the columns 58,55 and modify the column 49, such that it now contains 2 ones in rows 
4 and 6. We see that the if the columns 20, 40, and modified columns 49 and 50 are selected, 
then the four columns span only three rows. By construction we delete column 40. So now 
re = 54 and A^ = re(A; — 1) — 2. In the third step we delete the columns 57, 54 and 20 and 
modify the column 48 such that new column has only two ones in rows 4 an 5. For re = 55, 56, 
N = n{k — 1) — 2. This gives re = 51 and = n[k — 1) — 3. In the next few steps we delete 
the following columns. We represent this with the help of a table [21 

Lemma 2 Construction 1 gives a CBC with optimal N for (^"^2) ^ n < {k — l)(;j"^]^)- 
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Step 


Columns deleted 


Column Modified 


Column after modification 


n 


N 


1 


56,59,60 


50 


(000011)^^' 


57 


n{k - 1) - 1 


2 


58,55,40 


49 


(OOOlOiy^' 


54 


n(A;-l)-4 


3 


57,54,20 


48 


(000110)^ 


51 


n(A;-l)-4 


4 


53,39,38 


47 


(001001)^' 


48 


n{k - 1) - 4 


5 


52,19,37 


46 


(001010)^' 


45 


n{k - 1) - 5 


6 


51,18,17 


45 


(001100)^' 


42 


n{k - 1) - 6 


7 


36,35,33 


44 


(010001)^ 


39 


n{k - 1) - 7 


8 


16,34,32 


43 


(010010)^' 


36 


n{k - 1) - 8 


9 


15,14,31 


42 


(010100)^ 


33 


n{k - 1) - 9 


10 


13,12,11 


41 


(oiioooy^' 


30 


n{k-l) - 10 


11 


1,2,3 


4 


(110000)^' 


27 


n{k-l) - 11 


12 


5,6,21 


7 


(101000)'^' 


24 


n{k-l) - 12 


13 


8,22,25 


9 


(100100)'^' 


21 


n{k-l) - 13 


14 


23,26,28 


10 


(100010)'^ 


18 


n{k-l) - 14 


15 


24,27,29 


30 


(100001)'^ 


15 


n{k-l) - 15 



Table 1: Steps of Construction 1 



Proof : We first show that the construction gives a valid CBC and then show that the 
construction gives the optimal value of for any n in the given range. 

Prom the Theorem 2.9 and Lemma 3.5 given in [2] we find that that the optimal values 
for n = {k — l)(^'^j^) and n = (^"^2) hold. Steps 1 and 2 have been shown to result in a CBC 
with optimal N (Refer to Theorem [T] and Theorem [21 The reason for selecting m — k columns 
from the same group and one column from another group for deletion in Step 3 is that, if we 

m-k+l k-3 m-k+2 fc-2 m-k+1 k-l 

choose the column (Oo'^^lO iT^^)^, (Oo'^^lT^^)^, and the column (Oo'^^lT^'^'^ 
k — 2 times then the k columns will span only k — \ rows, which is undesirable. So we have 

m—k+l fc— 1 

delete another column (00 • • • 11 • • • 1)"^ from one of the fc — 2 groups ( say group §2)- For a 

m— fc+l fc— 1 

similar reason we delete m — k columns from the same group and one of the (00 • • • oTT"^"— "T)"^ 
columns in Step 4. 

We see that we cannot further modify any column at each step in a manner similar to the 
Observation 1 following Theorem [1] 

We have already seen that there are m — fc + 2 columns within the same group which 
have ones in the same fc — 2 rows. When we consider the columns which have a one in the 
m — k + 2-th row in the same group then we note that i such columns in the same group have 
been already deleted in the previous steps. So we delete m — fc + l — i = m — fc — i + lof the 
remaining columns having one in the m — k — i + 2-th row and fc — 2 ones in the same fc — 2 
rows. We modify the remaining one column. However in this process we can observe that fc 
columns can be selected such that they span only fc — 1 rows. Hence i other columns have 
to be deleted from other groups ( as done in step 4). Thus each step we delete m — k + l 
columns. 

After {^12) steps we will be left with (fc — 1){^Z2)^ such that (^^2) columns will have 
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k — 2 ones and the rest have k — 1 ones (with one in the first column) . Now we start deleting 
and modifying columns from the {k — 2) ("^2 ) columns such that at each step m — k + \ 
columns are deleted and one column is modified. So after (^JTs) steps we are left with (^^3) 
new columns will have been modified. Thus there will now be i^Z^) + (T-2 ) ~ columns, 
each having k — 2 ones. 

At each step the we get a CBC with optimal value of N given n. So the entire construction 
gives a CBC with optimal N . ■ 

From the above Lemma we arrive at the following theorem. 

Theorem 3 If < n < {k - d = {k - - n and I = [^^^\, then 

N(n, m, k) = [k — l)n — /. 

We next discuss about column uniform batch codes of rate 1/3. 

3 Column Uniform Batch Codes with rate 1/3 

Column Uniform Batch codes with rate 1/3 are CBCs in which every block of the dual set 
system contains precisely 3 points. This means that every item is stored in exactly three 
servers. 

We think of the points as vertices of a graph. Any three points which lie in the same 
block correspond to a triangle in the graph. For any block h such that ax,b = = 0^,6 = 1 
and Cj^b = for i = {1, 2, . . . , m}\{x, y, z}, there is a triangle xyz in the corresponding graph. 
From Corollary 3.3 and 3.5 of [2] we see that 

1. n(m,3,4) = 3(") 

2. n(m,3,5) = (™) 

We next try to find some values for n(m, 3, 6). The question of finding a valid (n, 3n, k, m)- 
CBC thus boils down to finding a graph such that any k triangles in the graph must span 
atleast k vertices. More importantly, we are interested in finding the graph on n vertices 
which has the maximum number of triangles such that any k triangles in the graph will span 
atleast k points. 

We give a construction for a column uniform CBC which has rate equal to 1/3. We note 
that the graph has four triangles. So any four triangles span atleast four vertices. Let 
m = 8. Then the graph Gg (Refer to Figure [2]) has 16 triangles such that any 6 triangles 
spans atleast 6 points. So there exists a (16,48,6,8) — CBC. 




Figure 2: The graph Gg. 

Now we consider the graphs and as units. The units have be interconnected as 
shown in the Figure [3l 
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Figure 3: A unit consisting of Gg and K4 



We replace the graphs Gg and by an octagons and squares. If there are g octagons 
and s squares, then there will be 8g + 4s vertices and 16g + 4s + 4e triangles, where e is the 
number of common edges between faces. The calculation of e will be done in the following 
way. 




Figure 4: The pattern of octagon and square. 

We refer to the Figure H] below. Level d denotes the octagons and squares at a distance d 
from the central octagon. Consider a graph with only one level. We see that the number of 
octagons and squares at distance one is four each. Number of common edges between faces 
is 16. We thus note that g = 5, s = 4 and e = 16. So if m = 8.5 + 4.4 = 56, then the number 
of triangles present in the original graph is 160. So n(56, 3, 6) = 16.5 + 4.4 + 64 > 160. For a 
graph with level two, the number of octagons and squares is 8 each. So g' = 13, s = 12, e = 56. 
So (152,3,6) > 480. To generalize at distance d, number of octagons and squares is Ad each. 



9 



g = l + 2d{d+l), s = 2d{d+l), e = (8(5 - 4d) + 4(s - 4ci))/2 + 8d+ (4d + 3(4d - 4) + 4)/2. 
Let m = 8g + As, then n{m, 3, 6) > 16g + 4s + 4e, where g = 1 + 2d{d + 1), s = 2d{d + 1), 
e = Ag + 2s-8d- 4. 

Given the vahic of m, we find lower bound for (m, 3, 6) in the following way. 

The distance can be calculated as d = g _ l + 2d((i+l), s = 2d{d+l), 

e = 45 + 2s - 8d - 4. 

Let A' = 16g + As + 4e, m' = m - (8g + As), j = m'(mod 8), then, 



A = A' + oct{j) + + A" 



where 



oct(j) 



for j = 


1,2 


1 for j = 


3, 


A for j = 


4, 


< 5 for j = 


5, 


6 for j = 


6, 


10 for j = 


= 7, 


16 for j = 


= 8, 



I'U) 



1 for j = 1 
A otherwise 



A" = max {8s' + 20i + Al}, 

i=l 



where s'= [r^\,l= { 



' 7 for j = 1, 2 and i + s' + 1 = 8, 
u + s' + i ioT j ^ 1,2 and i + u-\- s' = 8, and u 



u + s' + i — 1 otherwise. 



Ofor j = l,2, 
1 otherwise 



4 Optimal values of N for t > 2 

We generalize the problem to find Nt{n,m, k) which was not done in [2]. 
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Prom the above matrix we see that we can four elements from any four columns by selecting 
a maximum of two elements from each row. 

We give optimal value when m = n. This is a trivial case and for any t we can state the 
following theorem. 

Theorem 4 Nt{n,k,n) = n 

We next show the present the following example. 

Example 2 The following represents the incidence matrix corresponding to 7V2(10, 5, 5). 
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From the example it is very easy to see that 

Theorem 5 If n < tk, then Nt{n, k, k) = n. 

Next we present results for colmTin uniform batch codes with fixed rates. Let us consider 
the fixed rate c = 2. Suppose we can choose a maximum of t elements from each server. We 
construct the incidence matrix such that each of the (^) combinations is repeated. We can 
see that on choosing 2t columns in the worst case we can choose the same combination 2t 
times. Suppose this combination takes into consideration the first and the second rows. Then 
we can select t ones each from the first and second rows. This essentially means we have 
chosen 2t items taking t item from each of the first and second servers. From this we arrive 
at the following theorem. 

Theorem 6 nt(m,2,2t) = oo. 

As a consequence for k < 2t, we can choose the k items in any way we like such that at most 
t items are selected from each of the servers (columns). 

Corollary 1 If k < 2t, nt{m,2,k) = oo. 
Theorem 7 1. nt{m, c, ct + i) = ct(^^ , for i <t 

2. nt{m, c,ct + t + j) = t{'^), for j < t 

3. nt{m,c,k) < {'^) , if k > tm 

Proof : There are C^) distinct columns each having c ones. We show that nt{m, c, ct + 
i) > ct(™), for i < t. We can replicate each of the C^) columns ct times. Now if we select 
any ct columns columns which are replicas then we can choose t ones from each of the c 
rows and get a valid CBC. So nt{m,c,ct) > ct(J^). li k = ct + i, i < t, then we can chosen 
ct columns which are replicates and any other i columns. Suppose we choose the column 

c m—c c— 1 m—c—1 

(11 • • • 1 00 • • • 0)"^ ct times and the column (11 • • • 1 01 00 • • • 0)"^ i {i < t times then the ones 
can be selected from the c + 1 rows a maximum of t times. So we get a valid CBC. Any other 
choice of ct + i columns will also give rise to a valid CBC. So nt{m, c, ct + i) > ct C^) , when 
i < t. 

We show that nt{m,c,ct + i) < ct(™), when i < t. Suppose atleast one of the (™) 
columns be repeated ct + 1 times. For k > ct, let the choose columns include all the ct + 1 
columns. Then at least one server contributes more than t items, which is a contradiction. 
So nt{m, c, ct + i) < ct (™) , when i < t. 

Hence the Part 1 of the theorem follows. 

The proof of Part 2 is similar to Part 1. 

Consider k > tm. If nt{m,c, k) > C^), then any k columns will span only m rows. Since 
k > tm, at least any two rows contribute more than t elements, which violates the condition 
that of combinatorial batch codes. ■ 

Corollary 3.3 and 3.5 of [2] is a special case of the above Theorem [7] when t = 1. 

Comment We give below a table which compares the storage N for t = 1 and t = 2. We 
observe that the storage space required when t = 1 is about two times than when t = 2. This 
shows that for storage efficiency we can increase the number of probes. However we may note 
that for load balancing the number of probes cannot be too large. 
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n 


m 


k 


N{t = 1) 


N{t = 2) 


180 


10 


5 


640 


360 


180 


10 


6 


684 


360 


720 


10 


7 


4185 


2160 


240 


10 


9 


1860 


720 



From Theorem [7] it follows that 

1. n2(m,2,5) =4(™) 

2. n2(m,2,6) =4(^) 

3. n2(m,2,7) =2(^) 

4. n2(m,2,8) =2f^) 

We next try to find out the values of n2im, 2, 9) and n2(m, 2, 10). We note that n2im, 2, k) < 
(^),fc>10. 

We look at the graph in Figures 5(a) and |5(b)"j The graph does not have any triangle or 
square. 



(a) Square and triangle free (b) Square and triangle free 
graph on 9 vertices graph on 11 vertices 

Figure 5: Square triangle free graphs 

From this the following Lemma [3] follows. The lemma is important in proving Theorem [8j 

Lemma 3 Let G be a graph on v vertices. Then the maximum number of edges such that G 
does not any triangle or square is greater than 2 + 3[{v — 3)/2j . 

Theorem 8 1. n2(m,2,9) > (") +2[m/3j. 
2. n2(m, 2, 10) > (™) + 2 + 3L(m - 3)/2j . 

Proof : If we consider all possible two combinations, then there are (^) columns. We 
note that (2) = 10. So if 9 elements are selected atleast five servers must contribute maximum 
two elements. So n2(m, 2,9) > C^). Similarly for n2(m,2,10) > C^). 

Let the incidence matrix containing all the (™) columns be denoted by A. Now we try to 
find if we can add more columns containing two ones each. We represent the new augmented 
incidence matrix by A' = (a^ j)- We map the incidence matrix to a graph, such that columns 
represent the edges. An edge i is incident on two vertices e and f if a'^ ^ = a'j ^ = 1. We have 
considered the cases when all the k items represent columns in A. Now we consider some 
items to be selected from A' as well. A' is a sub matrix of A constructed taking some columns 
of A. 

We first prove Part 2. We are interested in finding the maximum number of columns that 
we must select from A to form A' such that all the conditions of combinatorial batch codes 
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are met. If all the k elements are selected from A' , then we have a valid code. So we select 
some elements from A and some from A' . In particular we consider the condition where the 
same row is selected from the matrices A and A' . The length of A' can be analyzed once 
we consider this situation. We note that if we select items such that they belong to five or 
more servers, then it is clear that a maximum of two items are selected from each server. So 
we consider the situation in which items are selected from four servers. Now we try to see 
which columns can be replicated. At the maximum six columns can be selected from A such 
that the columns span only four rows. If another four columns are so chosen as shown that 
they belong to the same row, then the ten chosen columns will span at the maximum four 
rows giving an invalid CBC. We call such combinations the forbidden configurations. Thus 
columns can be replicated such that the underlying graph does not contain any subgraph as 
the forbidden configurations. The forbidden configurations are graphs on four vertices with 
four edges as shown in Figure O 




Figure 6: Forbidden configuration : graphs on four vertices which contain four edges. 

From the graphs we see that there can be no triangle or square. Prom Lemma [3] we see 
that the maximum number of edges in a graph having no triangle or square is 2 + 3[(f — 3)/2j . 
Thus n2(m, 2, 10) > (™) + 2 + 3[(m - 3)/2j . 

While finding a lower bound for n(m, 2, 9) we see that the following are the forbidden 
configurations (Refer to Figure [7]) . We find that the graphs which do not have subgraphs 




Figure 7: Forbidden configuration : graphs on four vertices which contain three edges. 

as any of the configurations. Such graphs will consist of disjoint paths of length two. The 
number of edges on m vertices is 2[m/3j . So n(m, 2, 9) > (™) + 2[m/3j . ■ 

5 Conclusion and open problems 

In this paper we answer some of the questions put forward by Paterson, Stinson and Wei 
in [2]. We also study N(n,m,k) for t > 1. The following questions remain unsolved. 

1. What is the optimal value of N{n,m, k) for n < {jj^2}- 

2. How close are the bounds for codes (n, 3n, m, fc)-CBC. 

3. Finding optimal solutions for Nt{n,m, k). 
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